ych 


Studies of Scale ont 


Obtained by the Methéd of | 
‘Appeating Intervals 


“Shy, 


1 


Mo, 388 
1955 


Sam C. W 


4 
The America Hosier ine. 


Whh the Prychological Monograp is 


Editor 


Of Health, Education, and Weljare 
ce. Office of Education 
Washington 25. D.C. 


eine Editors 


E, Jones 
Donatp W. MacKinnon 
Lorrain A. Rices 

Cart R. Rocrrs 

Saut Rosenzweic 

Ross STAGNER 

Perecrvar M. SymMonps 
Josern 

Lepyarp R Tuckex 
Josern Zusin 


lack of Pyychoiogical Monographs can print only the original 
or contribagion ef the author. Backyround and bibliographic materials 
must, in general, be totally @neluded or kept to an irreducible minimum. Statistical 
tables should be used: me prea only the most important of the statistical data or 
evidencs, 


page of cher ipt should contain the tide of the paper, the author's 

name, and (or his city of residence). Acknowledgments 

should be kept brief, wae sippest a8 a footnote on the first page. No table of contents 

need For Gitections or suggestions on the preparation of manu- 

scripts, see!’ Conaan, treparation of manuscripts for publication ag mono- 
Bi Prychol., 1948. Why 447-459. 

aad sales, change of addgets, ev.) should be aidressed to the American Psychological 

9 inc., St. N.W., Washington 6, D.C. Address changes 

the Gee month to take effect the following month, Undelivered 


ng from changes will not be replaced; subscribers should motify 


syc pica Mono Ta hs 
m@meraland Applied 
the Pi chology Mono 200 the Archives of Prychotogy 
A; 
Jour 
EvGEntia 
Fona 


Vol. 69, No. g 


Whole No. 388, 1955 


Psychological Monographs: General and Applied 


Studies of Scale and Ambiguity Values Obtained by the 
Method of Equal-Appearing Intervals’ 


Sam C. Webb 
Emory University 


I. INTRODUCTION 


HOUGH the method of equal-appearing 
‘imines has been widely used in con- 
structing attitude scales, a recent review of 
the literature (12) suggests there still re- 
main aspects of it which have not been sys- 
tematically evaluated. This monograph 
presents a series of studies which provide 
additional data concerning some aspects of 
the method. Specifically, data are provided 
concerning the following six topics: 

1. Efficiency and reliability of estimators 
of scale and ambiguity values. 

2. Interactive effects of number of 
judges, number of intervals, and efficiency 
of estimating formulae on the reliability of 
scale and ambiguity values. 

3. Intercorrelations for scale and am- 
biguity values. 

4- Effects of number of judges, number of 
intervals, and efficiency of estimating for- 
mulae on the relation between scale and 
ambiguity values. 

5. Relation of scale and ambiguity values 
obtained by the procedures of this study to 
the values obtained by Thurstone and 
Chave. 


' This study has been supported by a grant from 
the Emory University Research Committee. The 
author is indebted to the staffs of the psychology 
departments of the Georgia Institute of Technol- 
ogy and the University of Georgia, and to the 
staffs of the psychology and sociology depart- 
ments of the Atlanta Division of the University of 
Georgia and of Emory University for cooperation 
in the collection of data for this study. 


6. Relevance of scale and ambiguity 
values for selecting items for Guttman- 
type scales, 


II. AssemBiinc THE Data 


The general intent of most of these 
studies was to investigate the effect of three 
factors—efficiency of estimating formulae, 
number of judges, and number of judging 
intervals—upon various aspects of Thur- 
stone-type scale and ambiguity values. For 
convenience these factors are hereafter re- 
ferred to as formulae, judges, and intervals. 

The first step was that of deciding how 
these factors should be varied. A considera- 
tion of previous studies suggested that 
three different numbers of intervals, three 
different numbers of judges, and four dif- 
ferent formulae for computing scale and 
ambiguity values should be employed. The 
number of intervals selected were 11, 7, 
and 5. The number of judges selected were 
25, 50, and 100. The computing formulae 
were the mean and three percentile esti- 
mates of the mean for scale values, and 
the standard deviation and three percentile 
estimates of the standard deviation for am- 
biguity values. These computing formulae 
are described in more detail in the next 
section. 

The second step was that of selecting a 
set of attitude items. For this study the 130 
statements of attitude toward the church 
of Thurstone and Chave (10) seemed ap- 
propriate. This set had the advantages of 


2 SAM C. 


being relatively large in number and of 
having adequate representation in all inter- 
vals of the scale,’ 

The third step was that of assimilating a 
pool of judgments of these items from 
which scale and ambiguity values based on 
various judges-intervals-formulae combi- 
nations could be obtained. 


A. Collection of Judgment Data 


The method of collecting the judgments 
of the items was a modification of that used 
by Thurstone and Chave, Judgments using 
11, 7, and 5 intervals were recorded on 
specially prepared IBM answer sheets 
which provided appropriate numbers of 
identified intervals. The original Thurstone 
and Chave directions (10), modified to 
provide for recording ratings on the answer 
sheet, were used. Ratings were made by 
groups of 20 to 30 students during a 50- 
minute class period, The directions were 
read aloud to the students, and all ques- 
tions concerning them were answered be- 
fore the rating of the items started. Special 
precautions which were taken to insure a 
minimum of error associated with a failure 
, to follow directions have been described 
elsewhere (13). 


B. Subjects 


The t1-interval data were collected 
from 243 students enrolled in psychology 
classes at the Georgia Institute of Technol- 
ogy. The 7-interval data were collected 
from 244 students enrolled in psychology 
classes at the University of Georgia; and 
the 5-interval data were collected from 225 
students enrolled in psychology and so- 
ciology classes at Emory University. 


C. Selection of Subsamples 
All sortings were next examined for ir- 


2 While Edwards (1) has demonstrated the in- 
effectiveness of neutral items, these items were in- 
cluded to prevent gaps in the distributions of 
indices of scale value. 


WEBB 


regularities; on the basis of a classification 
of irregularities (13) sortings with the most 
serious irregularities were eliminated until 
the number of sortings remaining in each 
set of data was reduced to 200. | 

Sortings in each group were then num- 
bered 1 to 200. Finally for each set of data, 
samples of 25, 50, and 100 sortings were 
selected according to the following pro- 
cedure: Using a table of random numbers, 
25 sortings were drawn from the pool of 
200 sortings for the 11-interval data. This 
subsample was numbered 11-I-25 to indi- 
cate it was composed of 11-interval sortings 
and that it was sample number 1 of size 25, 
Without replacing these sortings, a second 
set of 25 sortings was drawn by using the 
table of random numbers and labeled 
1t-Il-25 to indicate it was the second 
sample of size 25. These two samples were 
combined to form sample 11-I-50, the first 
sample of size 50; and a second sample of 
size 50, labeled 11-II-50, consisted of 50 
sortings selected at random from the re- 
maining 150 sortings in the pool. Finally 
the two samples of size 50 were combined 
to form a sample labeled 11-I-100. The 
remaining 100 sortings formed sample 
11-[l-100, Similar procedures were fol- 
lowed in selecting samples for the 7-inter- 
val and s5-interval data. The procedure 
therefore yielded a total of 18 samples, two 
for each of the nine interval-judge combi- 
nations. For each sample the frequency 
with which each item fell in each category 
was determined. Cumulative proportions 
were determined, ogives drawn, and scale 
and ambiguity values computed by pro- 
cedures described in the following section.’ 


Erricrency AnD RELIABILITY OF 
EsTIMATORS OF SCALE AND 
AmBiIGuITy VALUES 
In developing the method of equal-ap- 


* Items left unrated by subjects were arbitrarily 
placed in the neutral category, 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS 3 


pearing intervals Thurstone assumed that 
the distribution of judgments for each item 
when plotted on the attitude scale is de- 
scribed by the phi-gamma function (9). 
While this assumption implies that the 
mean and standard deviation should be the 
most reliable estimates of scale and am- 
biguity value, the median and interquartile 
range have generally been used for compu- 
tational purposes. Except for a comparison 
of the mean and median as estimates of 
scale value, little attention has been di- 
rected toward the problem of determining 
the need for, or possibility of, using other 
statistics as indices of scale and ambiguity 
values. 


A. Purpose 


On the basis of Thurstone’s assumption 
of the distribution of judgments for each 
item plus the demonstrations by Kelley 
(7), Mosteller (8), and Yost (15) that the 
efficiency of various combinations of per- 
centile values for estimating the mean and 
standard deviation of a normal supply in- 
creases as the properly selected number of 
percentile points included in the estimating 
formula increases, it should follow that the 
mean and standard deviation should be 
the most reliable estimates of scale and 
ambiguity value and that estimates of these 
based on successively less efficient estimat- 
ing formulae should show successively less 
reliability. The purpose of this study there- 
fore was to investigate to what extent the 
use of successively more efficient estimating 
formulae would be accompanied by suc- 
cessively higher reliability of the estimates. 
Attention is focused primarily on the order- 
ing of the estimates of reliability according 
to magnitude. The problem of significance 
of differences is reserved for a later section. 


B. Choice of Formulae and Procedures 


For measures of scale value the mean 
and three estimating formulae based on 


percentile values were used. These formu- 
lae and the identifying symbols for them 
used in this study were M,=Pyo, Mz, 
and M;=.333(P20+Pso 
+ Po). These have an efficiency of .64, .81, 
and .88 respectively (8). For measures of 
ambiguity the standard deviation and 
three estimating formulae based on per- 
centile values were used. These formulae 
and the identifying symbols for them used 
in this study were 
.3388(Py3— Po7), and §;= .2157( Post Pao 
— Po;). These have an efficiency of 
.37, 65, and .75 respectively (8). 

For each of the two samples containing 
100 judges for the 11-, 7-, and 5-interval 
data, percentile values required for the 
three estimating formulae were obtained 
by dropping perpendiculars at the speci- 
fied points. It should be noted that Thur- 
stone and Chave’s procedure of extrapolat- 
ing for Ps and of doubling the quartile dis- 
tance for estimating ambiguity values for 
extreme items was not followed. 

While the general hypotheses to be 
tested in this study were that the ordering 
of reliability of estimates of scale value 
should be M>M,;>M,>M, and that the 
ordering of reliability of estimates of am- 
biguity should be S>S$,;>5,>5,, it was 
necessary to state these hypotheses slightly 
differently for the two procedures used in 
treating the data. 

According to the first procedure the 
three percentile estimates of scale value 
were correlated with the item means and 
the three percentile estimates of ambiguity 
were correlated with the item standard de- 
viations for each sample. These results are 
shown in Table 1. In terms of these data 
each successively more efficient percentile 
estimate of scale and ambiguity value 
should provide successively higher correla- 
tions with the mean and standard devia- 
tion respectively. Accordingly hypotheses 
could be stated as follows: Hypothesis 1. The 


SAM C. WEBB 


TABLE 1 


CoRRELATIONS OF PERCENTILE EsTIMATES OF SCALE AND AMBIGUITY 
VALUE WITH MEANS AND STANDARD DEVIATIONS 


Sample II—100 


M vs.— S vs.— 


.9989 
-9992 


.9968 Si 
| 
| 


-9973 
-9988 


9990 


ordering of the correlations of M,, M2, and 
M, with M is ry we. ve. My 
ve. Hypothesis IT. The ordering of 
the correlations of S;, S:, and Ss with S is 
1S ve. ve. ve. By 

In the second procedure values for the 
mean, standard deviation, and percentile 
estimates of these for the two samples of 
each interval size were correlated so as to 
provide reliability coefficients for each 
estimate of scale and ambiguity value. 
These data are shown in Table 2. On the 
basis of this procedure two hypotheses 
could be stated as follows: Hypothesis 111. 
The ordering of the reliability coefficients 
for estimates of scale value should be 
M>M,;>M:>M,. Hypothesis IV. The 
ordering of the reliability coefficients for 


TABLE 2 


RELIABILITY COEFFICIENTS FOR ESTIMATORS 
or SCALE AND AMBIGUITY VALUE 


11 Interval 7 Interval | 5 Interval 


+9925 
-9920 
-9906 


-749 
. 860 -778 
.721 
.710 


-9952 

-9949 

-9950 
711 


-9949 
-9979 
-9976 


estimates of ambiguity value should be 

In view of the interest in the literature in 
the comparability of median vs. mean and 
standard deviation vs. interquartile range 
as measures of scale and ambiguity value, 
two additional hypotheses could be tested 
on the basis of these data. Hypothesis V. 
There is no difference in reliability be- 
tween the mean and median as estimates 
of scale value. Hypothesis VI. There is no 
difference in reliability between the stand- 
ard deviation and the interquartile range 
as estimators of ambiguity value. 


C. Results 


The data of Table 1 show that for all six 
samples the ordering of the correlations of 
the percentile estimates with the mean is in 
the predicted order. Hypothesis I is there- 
fore confirmed. Since the ordering of the 
correlations of percentile estimates with the 
standard deviation follows the predicted 
order in only one of the six samples, Hy- 
pothesis II is not confirmed. 

The data in Table 2 show that the esti- 
mates of scale value fall in the predicted 
order only once in three trials. Since the 
probability of obtaining the predicted 
ordering at least once in three replications 


4 
Sample I—100 | 
M vs.— S vs.— 
M -9979 Si +732 | M, 760 
11 Interval .9991 S: .907 M, .928 
Ms .9994 S; .893 } M; +922 
M, .9954 Si .757 M, «722 
7 Interval .99856 .904 M, .891 
M; 99859 S; .88s5 S; .862 
M, 9965 S: .758 M, .676 
5 Interval .997 S; .864 M, S, .884 
M;  .9982 S; .885 M, | S; .853 
M 
M, 
+9959 
M, 
Ss 800 
Ss 894 
Sy .870 
Si 841 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS 5 


by chance is .115, Hypothesis III is re- 
jected. Since the specified order of reliability 
coefficients for the estimates of ambiguity 
does not appear at all, Hypothesis IV is 
not sustained. To test Hypothesis V the co- 
efficients for M and M, of Table 2 were 
converted to Fisher’s z. On the basis of this 
transformation an analysis of variance was 
performed which yielded an F of 4.187. 
Since a one-tailed test is appropriate for 
testing this hypothesis, the F was con- 
verted to t= 2.04 which for df = 2 is signifi- 
cant at the 5% level of confidence. Since § 
is predicted to be greater than 5, and since 
it was greater only one in three trials, 
Hypothesis VI also is not confirmed. 


D. Discussion 


Despite the fact that only two of the 
three hypotheses relative to estimates of 
scale value were confirmed, it seems safe to 
assert that the data show a trend for scale 
values computed from successively more 
efficient formulae to show a successively 
higher reliability. However, since even the 
least efficient estimate M, provides re- 
liability coefficients greater than .g9, the 
increased reliability of scale values result- 
ing from the use of more efficient formulae 
is of little practical consequence. 

While none of the hypotheses concerning 
the ambiguity values were sustained, the 
data suggest that two of the computing 
formulae—S, and Ss—yield more reliable 
estimates of ambiguity than are afforded 
by the commonly used interquartile range. 
It is not clear, however, to what extent the 
violation of the assumption of normality of 
distribution of judgments per item, caused 
by a lack of correction for the influence of 
the end effect, has prevented the reliabili- 
ties of these formulae from being higher 
than the obtained values, It is possible, as 
Kelley (7) has suggested, that if some other 
form of distribution which more nearly 


represents the modal-type distribution of 


judgments for attitude items were used as 
the basis of deriving estimating formulac, 
it might be possible to obtain higher re- 
liabilities than those obtained by the for- 
mulae used here. 

An unexpected finding about ambiguity 
values was that in two out of three sets of 
data the reliability of S was lower than that 
of S;. Inspection of the scattergrams shows 
that a partial cause of these low coeflicients 
is a wide discrepancy of the two § values 
for a few items. In every case these items 
proved to be extreme items which a few 
judges of one sample but not the other had 
judged to fall at one extreme of the con- 
tinuum, while all other judges had judged 
them to fall at the other end. It appears, 
therefore, that the great sensitivity of § to 
extreme deviates makes it unsuitable as an 
estimate of ambiguity for data collected by 
the method of equal-appearing intervals. 


IV. Errectrs or Numper or Jupces, Num- 
BER OF SCALE INTERVALS, AND Er- 
FICIENCY OF EsTiMATING FORMULAE 

ON THE RELIABILITY OF SCALE 
AND AMBIGUITY VALUES 


The purpose of this study was to investi- 
gate within the framework of a three-di- 
mensional factorial design the main and 
interaction effects of three variables on the 
reliability of scale and ambiguity values. 
The variables selected were number of 
judges, number of scale intervals, and estimating 
formulae. 

On the basis of previous research reports 
and statistical theory it was expected that 
scale and ambiguity values would show in- 
creased reliability as a function of increas- 
ing number of judges, increasing number 
of scale intervals, and increasing efficiency 
of estimating formulae. As has previously 
been stated 11-, 7-, and 5-interval scales 
and groups of 25, 50, and 100 judges were 
used. On the basis of the results of the pre- 
ceding study, three estimating formulae for 


6 SAM C. WEBB 


scale values—M,, M2, and M,y—and three 
estimating formulae for ambiguity values— 
Si, S2, and S;—were used, These formulae 
have been described in section III. 


A. Design and Procedure 


For each of the two samples for each of 
the nine judge-interval combinations, three 
estimates of the scale and ambiguity values 
were computed by procedures already de- 
scribed, The reliability of the various esti- 
mates of scale and ambiguity values for 
each judge-interval-formula combination 
was then determined by computing the 
Pearson product-moment correlation be- 
tween the appropriate values for the 130 
items of the two samples for each judge- 
interval-formula combination, The result- 
ing coefficients for scale and ambiguity 
values are shown in Tables 3 and 4, re- 
spectively. Finally the coefficients of each 
table were transformed into Fisher’s z, and 
an analysis of variance was performed on 
these transformed values. The results of 
these analyses are shown in Tables 5 and 6. 


B. Results 


Scale Values. All the reliability coefficients 
for scale values fell within the narrow range 
of .g760 and .9g964. The consistency of 
ordering of these coefficients across col- 
umns in the direction of increased reliabil- 
ity with increase in efficiency of the com- 
puting formula, across rows in the direction 
of increased reliability with increase in 
number of judges, and across blocks in the 
direction of increased reliability with in- 
creasing number of intervals, should be 
noted. 

When tested against the triple interac- 
tion term as error, only one of the three 
first-order interactions (judges by inter- 
vals) is significant. This one interaction is 
significant at a 1% level of confidence. As 
for the main effects, when the variance for 
columns (formulae) is tested against the 
triple interaction term as error, the null 


TABLE 3 


RELIABILITY OF SCALE VALUES FOR THE 
INTERVAL-] UDGE-FORMULA 
COMBINATIONS 


Estimating Formulae 


M, M: 


. 9866 


| 
.9856 
| 
| 


| 


hypothesis of no difference among the 
column means is rejected at a 5% level of 
confidence. Though the null hypothesis of 
no difference among means for judges and 
no difference among means for intervals 
cannot, in the strictest sense, be made be- 
cause of the significant judges-by-interval 
interaction, it nevertheless seems fair to 
assert that the main effects of judges and 
intervals are significant sources of vari- 
ance. This assertion is based on the fact 
that highly significant F ratios are obtained 
by dividing the judges and intervals vari- 
ances by the variance of the appropriate 
interaction terms. These high ratios are no 
doubt a function of the consistency of 
TABLE 4 
RELIABILITY OF AMBIGUITY VALUES FOR 


INTERVAL-] UDGE-FORMULA 
COMBINATIONS 


Estimating Formulae 


Ss 


560 
825 


-607 
.762 


No. | y 
vals | | M; 
| | | 
50 | .9927 -9942 
100 | .9956 -9904 
25 | .9804 .9820 
7 5° . 9896 -9924 
100 -9959 | -9949 
25 | .9760 | .9764 | -9779 ; 
5 5° .9825 -9832 
100 | .g9g06 | .9913 | .9920 
|) 
Judges 
100 .870 
25 .571 | 
7 50 | .688 | .686 
| 100 807 | B04 . 860 
| a5 | +573 
| 094 | +733 
| 100 | | | -778 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS 


TABLE 5 


ANALYSIS OF VARIANCE OF FISHER'S ¢ TRANSFORMATION OF SCALE 
VALUE RELIABILITY COEFFICIENTS 


Source | | Variance | Fae* | Fu Fre 


| 

Rows 1.§92551 | .796275 | | 47.613 4 0.787 

(Judges) | (.001) (.001 
| 


(Intervals) (.001) | (.01) 


Columns .016861 


008431 | 
(Formulae) 


6.227 | 


Blocks a -691235 | 345618 | 888.478 | 20.666 | 
(.05) 


RXB 066895 016724 | 12.352 

| | 
RXC .005678 | .001420 | 1.049 | 
| 


BXC 001555 | .000389 . 287 
3 (—) 


RXBXC 010833 .001354 | 


Total | 26 | 2.385608 | 


* Subscript indicates interaction used as error variance in F ratio. 


ordering across rows, columns, and blocks effects were computed. Using judges-by- 
noted above. intervals interaction as error, the differ- 

Finally, ¢ tests between the z values for ences between all possible pairings of the 
the various pairs of each of the three main _ three levels for judges were found to be sig- 


TABLE 6 


ANALYSIS OF VARIANCE OF FISHER’S s TRANSFORMATION OF AMBIGUITY 
VALUE RELIABILITY COEFFICIENTS 


Rows 5S? | 2 


Source | Est Fre* 


624755 | 
. 160822 | | 


| 


249510 | 
(Judges) 


Blocks Si? 
(Intervals) 


Columns 
(Formulae) 


RXB 


069218 | .034609 


| 
| 
| 
| 


| 
| 
| .1§0528 | .037632 163.617 
| | | (001) 
RXC S 016140 | .004035 17.543 | 
} | | (.001) 
BxC -019997 004999 | 21.735 | 
| (.001) 


| 


RXBXKC x 001842 .000230 


Total | 26 1.668056 


* Subscripts indicate interaction used as error variance in F ratio. 


7 
| | 
| 
| | 
| | 
re 
154.834 16.602 
| (,001) (.05) 
16.085 2.137 
| (.05) (—) 
8.577 | 6.923 | 
(.05) | (.05) | 
| 
| 
| | 


8 SAM C. 


nificant at a .oo1 level of confidence (two- 
tailed test). These differences were all in 
the expected directions. 

Again using this same interaction as 
error, the differences between all possible 
pairings of the three levels for intervals 
were tested and found significant at a 5% 
or higher level of confidence. These differ- 
ences were all in the expected directions. 
Using the second-order interaction as 
error, differences between all possible pair- 
ings of formulae were found to be signifi- 
cant at a 1% level of confidence in the 
expected directions. 

Ambiguity Values. The reliability coeffi- 
cients for ambiguity values ranged from 
.517 to 894. In Table 4 a high consistency 
of ordering is noted only across rows, where 
increased reliability with increased num- 
ber of judges is found. Especially among 
the coefficients based on 25 judges there 
appears to be little systematic variation. 

Using the second-order interaction as 
error, all first-order interactions are sig- 
nificant at a .oo1 level of confidence. 
Though no tests of the null hypothesis of 
no differences among means for judges, 
intervals, or formulae can, in the strictest 
sense, be made because of the significant 
interactions, the fact that all but one of the 
ratios obtained by dividing the variances 
for main effects by the appropriate first- 
order interactions are fairly large (signifi- 
cant at a 5% level of confidence) suggests 
that the main effects are significant sources 
of variance over and above the variance 
contributed by the interactive effects.‘ For 
this reason the significance of difference 
among main effects was further investi- 
gated by the use of ¢ tests. When both 
judges-by-formulae and_ intervals-by-for- 
mulae interactions are used as error the 

‘The F ratio for formulae over interval-by- 
formulae interaction actually falls below the 5% 
value by .017. Because of this slight difference it 


has for practical purposes been called significant at 
a 5% level of confidence. 


WEBB 


differences between the average z values 
for the three formulae were all in the ex- 
pected directions; S, was not found signifi- 
cantly different from S, (two-tailed test), 
but S; was found significantly different 
from and 

Using judges-by-intervals and judges- 
by-formulae variance as error, all differ- 
ences between judges were in the expected 
direction and all were significant at a 5% 
or higher level of confidence. All differ- 
ences between number of intervals were in 
the expected directions; but using judges- 
by-intervals interaction as error the aver- 
age z for 11 intervals was significantly dif- 
ferent from average z for 5 intervals at a 
5% or higher level. Only when intervals- 
by-formulae interaction was used as error 
were all differences significant at a 5% or 
higher level of confidence. 


C. Discussion 


From the data of Tables 5 and 6 it seems 
clear that the rank order of judges, inter- 
vals, and formulae for contributing to re- 
liability would be the same as their rank 
order for contributing to the total sum of 
squares—number of judges first, number of 
intervals second, and formulae third. A 
possible cause for the low ranking of 
formulae lies in the design of the experi- 
ment. For since all the estimates computed 
from the formulae were based on the ogives 
for the nine judge-intervals combinations, 
the requirement of randomness is not fully 
satisfied; and the three estimates of scale 
and of ambiguity values are to some extent 
correlated. This is especially so for M, and 
M,, since they have one percentile value in 
common, and for S; and S3, since they have 
two percentile values in common. It should 
be emphasized, however, that even when 
no adjustment is made for this correlation 
effect, the F ratios suggest significant dif- 
ferences; and the ¢ tests between the most 
correlated estimates are significant. 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS 9 


It should be further noted that schools 
(sources of the data) have been confounded 
with intervals; but since there is no obvious 
reason to question the comparability of the 
three groups of students in regard to ability 
to judge the items, this is not a serious de- 
fect of the design. 

In the planning of this study the hope 
was entertained that the reliabilities for the 
various judge-interval-formula combina- 
tions would be such that combinations in- 
volving more efficient formulae and fewer 
judges and/or intervals would produce re- 
liabilities on a par with those obtained by 
using less efficient formulae and more 
judges and/or intervals. Such findings 
would suggest alternate methods of ob- 
taining scale and ambiguity values of a 
desired level of reliability with a reduction 
of tedium involved in collecting and proc- 
essing data. Considered from this practical 
viewpoint, estimates of scale value com- 
puted by the formula M, on the basis of the 
judgments of 25 judges using a 5-interval 
scale appear to be as satisfactory as those 
computed by more efficient formulae on 
the sortings of more judges using more 
intervals. These results are in accord with 
the finding of other investigations (13, 
14). 

While the results for ambiguity values 
are not exactly what was hoped for, de- 
pending on the level of reliability desired, 
some choice of procedure seems evident. 
For example, if an investigator requires a 
reliability of .55 to .60 for his ambiguity 
values, estimates computed by formula S, 
on the basis of the judgments of 25 judges 
using 5 intervals would be satisfactory. 
Values with a reliability of approximately 
-75 could be obtained using 50 judges, 11 
intervals, and formula 5,; or 50 judges, 7 
intervals, and formula §;; or perhaps even 
100 judges, 5 intervals, and formula 5S). 
Reliabilities of approximately .80 could be 
obtained using 50 judges, 11 intervals, and 


formula S;; or 100 judges, 7 intervals, and 
formula 


V. INTERCORRELATIONS FOR SCALE 
AND AMBIGUITY VALUES 


Before recommending the use of any 
judge-interval-formula combination which 
differs from the conventionally used com- 
bination of 11 intervals, approximately 100 
judges, and M, and S, formulae, a compari- 
son should be made of the values obtained 
by the conventional and possible alternate 
combinations concerning similarity of 
ordering of items, in addition to reliability. 
Data for such a comparison are provided 
by intercorrelations among values com- 
puted on the basis of various judge-inter- 
val-formula combinations. On the basis of 
the work of other investigators and data 
previously presented in this study, it would 
be expected that all intercorrelations 
among scale values would be high, while 
the intercorrelations among ambiguity 
values would be more variable. The pur- 
pose of this section was to determine the 
extent to which these expectations are con- 
firmed by experimental data. 


A. Procedure 


Rather than compute all possible inter- 
correlations, only those necessary to dem- 
onstrate these expectations were computed, 
In general, the principle was followed of 
correlating values of such judge-interval- 
formula combinations as seemed appropri- 
ate with the values obtained by the combi- 
nation conventionally employed and with 
values obtained by combinations which 
yield higher reliability than that of the con- 
ventionally used combinations. Accord- 
ingly, scale values computed by formulae 
M, M,, M2, and M; from sample 11-I1-100 
were correlated with the scale values ob- 
tained by formulae M,, M2, and M,, for 
sample 5-II-25. These are shown in the top 
half of Table 7. In addition, scale values 


10 SAM C, 


TABLE 7 
INTERCORRELATION BETWEEN SCALE VALUES 
oF SELECTED SAMPLES AND SCALE VALUES 

or SAMPLE 5-II-25 


Sample 5-Il-25 

Sample 

M, | M, M, 
11-Il-100 M .9807 | .9790 -9773 
Ms | .9820 | -9804 | .9793 
My 9513 -9796 -9785 
M, | .9823 9810 9817 
11- I-100 9959 | -9954 0048 
7- |-100 M -9951 } .9951 
I-100 M_ .9909 9993 


computed by the formula M on samples 
11-I-100, 7-I-100, and 5-I-100 were corre- 
lated with scale values computed by for- 
mulae M,, M2, and M; for sample 5-I1-25. 
These are shown in the bottom half of 
Table 7. For ambiguity values estimates 
computed by formulae S,, S:, and S; on 
sample 11-II-100 were correlated with 
each other and with values computed by 
formulae and S, for every interval- 
judge combination for samples numbered 
II. In-addition, values computed by for- 
mulae §S;, and on sample 7-II-100 
were correlated with each other; and val- 
ues similarly computed on sample 5-II-100 
were correlated with each other. All these 
intercorrelations are shown in Table 8. 


B. Results 


Scale Values. Estimates computed from 


WEBB 


sample 11-Il-100 and the mean values 
computed on samples 11-I-100, 7-I-100, 
and 5-I-100, correlate approximately .98 
and .g9, respectively, with all values com- 
puted on the basis of sample 5-I1-25,. 
Ambiguity Values. The intercorrelations 
for ambiguity values show considerable 
variability. These range from .98 to .57. 
But except where indices are correlated 
with other indices computed from the same 
judge-interval combination, no intercorre- 
lation is higher than .83. With few excep- 
tions 5S; correlates higher with S$; than with 
and correlates higher with S, than 
S;. In those instances where values com- 
puted on the 11-interval 100-judge combi- 
nation are correlated with values com- 
puted on other judge-interval combina- 
tions there is a general tendency for: (a) 
correlations between S; values to be high- 
est, (b) correlations between S, values to be 
higher than correlations between 5S; values, 
(c) correlations with values computed on 
sortings of 50 judges to be lower than cor- 
relations with values based on 100 judges, 
and (d) correlations with values computed 
on sortings of 25 judges to be lower than 
correlations with values based on 50 judges. 


C. Discussion and Summary 


For scale values the intercorrelation co- 
efficients seem high enough to indicate that 
values computed by the formulae M,, M2, 
or M, on sortings of 25 persons using 5 


TABLE 8 
INTERCORRELATIONS BETWEEN AMBIGUITY VALUES FOR SELECTED 


100 50 100 
Se Si | Ss Se Si | Se Sp Si | Ss Ss S: | 
} 
Ss 88 | 83 81 78 | 70 67 62 | By 79 
| 
Sy 77 | 81 81 71 | 69 66 58 | Bo Bo 72 
S| | 77.77 69 | 60 64 | 75 67 bo 
? Ss | } So 
100 Sy So 
5 Sy 
S: | 


INTERVAL-] UDGE-FORMULAE COMBINATIONS 


7 s 
| 50 25 100 | 50 25 
Ss Se S, Ss Ss S, S: S; Ss St Si Sa Se Si 


76 71 73 | 75 68 66 77 16 08 | Bo Bo 66 | 78 71 64 
72 68 66 | 75 68 64 | 74 74 64 | 78 80 62 | 68 68 61 
72 63 76 | 66 58 60 70 66 66 | 71 69 64 | 64 63 50 
| 
93 85 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS il 


intervals order the items in essentially the 
same way as do indices computed by these 
same or more reliable formulae on samples 
using more judges and more intervals. 

For ambiguity values the intercorrela- 
tions of value computed on the 11-interval 
100-judge sample correlate from .83 to .57 
with values computed on the basis of the 
other interval-judge combinations used in 
this study. In view of the variability among 
these correlations, it seems advisable to de- 
termine how much departure from the 
ordering of items obtained from a particu- 
lar interval-judge-formula combination 
one will accept before deciding what in- 
terval-judge-formula computing combina- 
tion one might use instead. 


VI. RELATION BETWEEN SCALE AND 
AMBIGUITY VALUES 


The purpose of this section was twofold. 
First, an effort was made to evaluate the 
effects of sample size, number of intervals, 
and computing formulae on the magnitude 
of the relation between scale value and am- 
biguity values. Second, a comparison was 
made between the magnitude of the rela- 
tion between scale and ambiguity values 
derived from values obtained by the com- 
puting methods of this study and the mag- 
nitude of the relation for values obtained 
by the Thurstone and Chave computing 
methods. 


A. Procedure 


The fact that the relationship between 
scale and ambiguity values is curvilinear 
made it impossible to employ a design per- 
mitting an analysis of the variance among 
the various indices of relationship into 
parts attributable to the three independent 
variables about which interest was cen- 
tered. Instead the less satisfactory pro- 
cedure was employed of computing the 
unbiased correlation ratio € of ambiguity 
values on scale values for a number of in- 
terval-judge-formula combinations, and of 


attempting to state on the basis of inspec- 
tion of the data what effect these variables 
might have on the obtained relationships. 

€ was computed only for such interval- 
judge-formula combinations as would be 
most likely to demonstrate the effects of 
these variables on the magnitude of the re- 
lationship. In view of the high reliability 
of, and high intercorrelations among, scale 
values computed by the various formulae, 
values computed by the M, formula were 
considered as satisfactory measures of scale 
value. S,; and S$; values were considered as 
sufficient for computing formulae of am- 
biguity value, and groups of 100 and 25 
judges were considered as sufficient for 
numbers of judges. Accordingly the un- 
biased correlation ratios of ambiguity value 
on scale value for samples I and II for se- 
lected judge-formula combinations for the 
7-, and ti-interval data were com- 
puted. The results are shown in Table 9. 
Edwards’ computation (1) based on the 


TABLE 9 


UNBIASED CORRELATION RATIOS FOR S; AND 
S,; AMBIGUITY VALUES ON SCALE VALUES 
FOR SELECTED INTERVAL-J UDGE 

COMBINATIONS 


| | | 
In- | M 
terval| Sample | | 
5 I-100 | S; | 14 | .475 .689 
| Sy | 14 | «505 710 
| II-100 | 24 | 
| Ss 14 447 669 
I-25 Si 15 243 | 
I-25 Si 15 402 | .680 
7 I-100 S | 556 
Ss | 13 $21 
II-100 | S; 13 654 | .809 
Si 13 sor | «769 
I-25 Si 13 392 626 
Il-25 Si 13 104 440 
6| | S 11 482 | 
Ss | .538 +733 
II-100 | 1 .75§0 
Se |. | 
Il-25 |} a 178 | .422 
Thurstone and 
Chave data Si 1s 493 


Thurstone and Chave data is also shown in 
the table. 


B. Results 


As was anticipated all relationships were 
significantly curvilinear at a 1% level 
of confidence with low and high scale 
values being associated with low ambiguity 
values. 

On the basis of inspection only, the data 
of Table 9 seem to suggest the following 
findings: (a) For the 5-, 7-, and 11-interval 
e values for S; ambiguity values on M, scale 
values, the values of ¢« computed on the 
basis of sortings of 100 judges are higher 
than the values computed on the basis of 
the sortings of 25 judges. (b) For samples of 
100 judges, e's for S; ambiguity values on 
M, scale values do not appear to be sys- 
tematically higher than values of S; am- 
biguity values on M, scale values. (c) The 
size of ¢ for S; ambiguity values on M, scale 
values computed either on the basis of 25 
or 100 judges does not appear to be af- 
fected systematically by the number of 
intervals employed in sorting the items. 
(d) The size of ¢ for S; ambiguity values on 
M, scale values computed on the basis of 
the sortings of 100 judges appears to vary 
directly with the number of intervals em- 
ployed in sorting the items. (¢) All but two 
of the 18 ¢ values computed in this study 
equal or exceed the value obtained by 
Edwards on the basis of the Thurstone and 
Chave data. 


C. Discussion and Summary 


On the basis of these data the following 
tentative conclusions appear appropriate 
concerning the effects of number of inter- 
vals, number of judges, and computing 
formulae on the unbiased correlation ratio 
of ambiguity values on scale values: (a) the 
size of ¢ for S; ambiguity values on M, scale 
values appears to be a function of the num- 
ber of sortings on which the values are 
based, but not a function of the number of 


SAM C. 


WEBB 


intervals employed in sorting the items. 
(6) For samples of 100 judges the size of ¢ 
does not appear to be a function of the 
computing formula used. (c) Size of ¢ for S; 
ambiguity values on M, scale values com- 
puted on the basis of sortings of 100 judges 
varies directly as the number of intervals 
used in sorting the data. (d) The relation of 
ambiguity value on scale value based on 
indices computed by the procedures used 
in this study is higher than that obtained 
from values obtained by the method used 
by Thurstone and Chave. 

It should be emphasized that because of 
a lack of any sampling distribution against 
which to test specific hypotheses, this sum- 
mary can be regarded as no more than a 
statement which seems consistent with the 
empirical data, but which cannot be tested 
for significance. 


VIL. ReLation To SCALE AND AMBIGUITY 
VALUES OBTAINED BY THURSTONE 


It has previously been noted that in- 
stead of following the procedure of Thur- 
stone and Chave of extrapolating for Pso 
for estimating scale values and of doubling 
the quartile distance for estimating am- 
biguity values for extreme items, the pro- 
cedure of this study involved dropping 
perpendiculars at the required percentile 


_ points to obtain the values to be used in the 


various computing formulae. The purpose 
of this section was to evaluate what effect 
this procedural difference has on the 
ordering of items in regard to scale and 
ambiguity values. 

The procedure consisted in computing 
the following Pearson product-moment 
correlations: (a) Scale values which Thur- 
stone and Chave obtained on the basis of 
300 judges using 11 intervals vs. M, scale 
values obtained on the basis of samples of 
100 judges for the 11-, 7-, and 5-interval 
data of this study (these are shown in col- 
umn 3 of Table 10). (6) Ambiguity values 
obtained by Thurstone and Chave vs. the 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS 13 


TABLE 10 


CORRELATIONS OF SELECTED SCALE AND AMBI- 
GUITY VALUES OF THIS STUDY WITH SCALE 
AND AMBIGUITY VALUES OBTAINED BY 
THURSTONE AND CHAVE 


Thurstone and| Webb Data: 
have vs. Reliabilities 
Webb I vs. Il 


Webb Data 


Ir 
7 343 
.408 
5 +413 
369 


S; values obtained on the basis of samples 
of 100 judges for the 11-, 7-, and 5-interval 
data of this study (column 4 of Table to). 
(c) M, values of sample I,oo vs. M, values 
of sample II,o0 for the 11-, 7-, and 5-inter- 
val data (column 5 of Table 10). (d) §, val- 
ues of sample Ijo9 vs. S; values of sample 
IIo for the t1-, 7-, and 5-interval data 
(column 6 of Table to). 

Except for extreme items the computa- 
tional formula for the scale values for 
Thurstone and Chave and for this study 
would be the M, of this study. For am- 
biguity values except for extreme items, the 
computational formulae differ by a con- 
stant .5, since Thurstone and Chave used 
(Pi5— Ps), while this study used 5(P xs 
— Px). 


A. Results 


While the correlations of the Thurstone 
and Chave scale values with the scale 
values of this study are all high (approxi- 
mately from .g5 to .g7), they are lower 
than the correlations between the M, val- 
ues of the two samples for the 11-, 7-, and 
5-interval data. These range approxi- 
mately from .g91 to .g96. 

The differences between (a) the correla- 
tions of the Thurstone and Chave scale 
values with scale values computed by the 


method of this study and (6) the correla- 
tion between the two sets of scale values 
computed by the method of this study, 
were all significant at a 1% level of con- 
fidence. 

The correlations between the Thurstone 
and Chave ambiguity values and the §; 
values of this study range from approxi- 
mately .34 to .51. These are all lower than 
the correlations between the S, values of 
the two samples for the 11-, 7-, and 5-inter- 
val data. These range from .71 to .64, The 
difference between the z transformations of 
the correlations of the Thurstone and 
Chave ambiguity values with ambiguity 
values computed by the method of this 
study, and between the z transformations 
of the correlation between the two sets of 
values computed by the method of this 
study, were all significant at a 1% level of 
confidence. 


B. Discussion and Summary 


The results show a significant difference 
between ordering of items obtained by 
Thurstone and Chave and the ordering 
obtained in this study in respect to both 
scale and ambiguity values. When the co- 
efficients are converted to Fisher’s z, the 
differences in regard to scale value appear, 
in general, to be larger than the differences 
in regard to ambiguity values. 

Using computational procedures which | 
were apparently the same as those of this 
study, Edwards and Kenny (2) rescaled 
the items of Thurstone and Chave. The 
scale and ambiguity values of their scaling 
correlated .g5 and .18 with the scale and 
ambiguity values, respectively, of Thur- 
stone and Chave. Ten years after the con- 
struction of the Thurstone-Peterson Scale 
of Attitude toward War, Farnsworth (5), 
using the same procedures as the original 
authors, rescaled the 20 items of Form A, 
While the differences in scale values of the 
two scalings were significant at a 5% level 
of confidence for 15 of the 20 items, the 


14 SAM C. 
ordering of items for the two scalings gave 
a rank-difference correlation of .g88. 
While it is not possible to determine 
whether the differences of this study are 
caused by a change in computational pro- 
cedure or by a change of the cultural milieu 
in terms of which the items are evaluated, 
the results of this study plus those of these 
other investigators suggest that the change 
in computational procedure has some part 
in the changing of the order. This sugges- 
tion follows from the fact that a lower cor- 
relation was obtained between the scale 
values of the two scalings when, as in this 
study and in that of Edwards and Kenny, 
beth computational procedure and cul- 
tural milieu were changed, than was ob- 
tained when, as in the experiment of 
Farnsworth, only cultural milieu was 
changed. Even this suggestion must be 
viewed with caution, since the differences 
between the findings of Farnsworth and 
those of this study and of Edwards and 
Kenny may possibly be a result of differing 
numbers of items or differing item content. 
Additional evidence in support of the 
effect of computational procedure on order 
is provided by the data regarding the rela- 
tion between scale and ambiguity values 
discussed in the preceding section. 


VIIL. Retevance or SCALE AND Am- 
BIGUITY VALUES FOR CONSTRUCTING 
GuttMan-T ype SCALES 


Edwards and Kilpatrick (3, 4) have pro- 
posed the scale-discrimination technique 
for selecting a set of attitude items which 
will have a high probability of having 
satisfactory reproducibility when tested by 
the Guttman technique. The procedure 
involves both the Thurstone and Likert 
techniques of scaling and essentially con- 
sists of climinating half of the initial set of 
items with highest ambiguity and of 
choosing from the remaining items the de- 
sired number of items in each scale-value 


WEBB 


interval which have the highest discrimi- 
nation values..- 

The rationale of the technique is based 
on the logically derived generalizations 
that scale values of the Thurstone scaling 
process and discrimination values of the 
Likert scaling process are respectively re- 
lated to the cutting points and reproduci- 
bility of an item in the Guttman-type 
method of scale analysis. 

Since these generalizations were verified 
on small groups of items which had previ- 
ously been selected by the Thurstone or 
Likert scaling technique, and since the 
efficacy of the technique was later verified 
by application to a basic set of items which 
contained very few items in the neutral 
range, only meager evidence of the extent 
or magnitude of these relationships is pro- 
vided. 

The purpose of this study therefore was 
to investigate further the interrelations 
among the various indices involved in the 
scaling process—namely, scale values, am- 
biguity values, discrimination values, cut- 
ting points, and item reproducibility. Re- 
sults of such an investigation should be 
helpful in assessing the relative importance 
of the Thurstone- and Likert-type values 
for selecting items which would be scalable 
according to the Guttman criteria. 

In order to obtain results of maximum 
utility, it seemed desirable that the pro- 
cedures employed should meet the follow- 
ing requirements: (a) provide quantitative 
results, (4) include in the analysis all of the 
initial set of items and not merely those few 
which would be selected for scaling, and 
(c) include enough neutral items to prevent 
gaps in the distributions and thus make for 
greater reliability in determining curvi- 
linearity or linearity of regression lines. 

Conditions a and 6 require that for each 
of the involved items there be an index of 
each value expressed in quantifiable form. 
Such values are readily available for scale, 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS 15 


ambiguity, and discrimination values. 
Scale and ambiguity values are obtained 
by the Thurstone technique, while the @ 
coefficients which discriminate between the 
X% highest and X% lowest on total score 
vs. a suitable dichotomization of item re- 
sponses serve as a discrimination index. 
But indices of cutting points and item re- 
producibility are arrived at only after a 
subset of items has been selected and even 
then by a method of successive approxima- 
tion, It was therefore necessary to explore 
the possibility of finding a reasonable index 
of cutting point and reproducibility which 
could be obtained without employing the 
conventional Guttman-type analysis. 
Though an index for item reproduci- 
bility which would satisfactorily meet these 
requirements could not be devised, a usa- 
ble index of cutting point which met these 
requirements seemed available. Consider 
the fictitious data of Table 11 which shows 
a display of responses for one item tabu- 


TABLE 11 


DispLay or Fictitious Data FoR A SINGLE 
Item TABULATED BY THE CORNELL 
TECHNIQUE 


Rank 


Item Response Categories 


Subjects’ Undecided | Disagree 
Scores 3 4 5 


lated according to the Cornell technique 
for performing scale analysis. The left- 
hand column shows the rank order of sub- 
jects according to score; opposite each 
score an. “*X” indicates the subject’s re- 
sponse to the item. The cutting point re- 
fers to “that place in the rank order of sub- 
jects where the most common response 
shifts from one category to the next” (4, 
p. 103). In this example cutting points 
could be established at the appropriate 
points which mark the shift in predomi- 
nant response from 1 to 2, 2 to 3, 3 to 4, and 
4 to 5. But if, as a means of reducing error, 
categories for which responses intermingle 
are reduced to the extent that only two 
response categories remain, there would be 
only one cutting point. For this example 
the cutting point would fall between scores 
17 and 16, The most appropriate place for 
establishing the dichotomy would be be- 
tween response categories 2 and 3. Now 
since the decisions of where to establish the 
cutting point and the dichotomization 
point are made jointly on the basis of the 
patterning of responses, and since the de- 
termination of the point of dichotomiza- 
tion determines the marginal total or per- 
centage of responses falling in each of the 
two response categories, it follows that the 
cutting point and the marginal total are 
related. Since the marginal total would in- 
clude responses which, from the standpoint 
of reproducibility, constitute errors, the re- 
lation would not be perfect. But since 
minimization of error is one criterion con- 
sidered in establishing a cutting point, the 
effect of this disturbance on the marginal 
total as an index of cutting point should 
not be large. 

A proper determination of the cutting 
point and point of dichotomization can 
properly be made only from an array of 
data such as is shown in Table 11, But in 
view of the relations described above, it 
appears possible that an estimate of ac- 


20 xX | 
20 xX 
19 xX } 
19 
18 
18 } xX | 
17 xX 
17 xX 
17 xX 
16 X 
16 i X 
15 xX 
15 xX 
14 xX 
14 xX | 
14 xX 
13 xX 
13 | xX 
12 | X 
12 x 


16 


ceptable points of dichotomization could 
be made on the basis of an inspection of the 
distribution of responses used for comput- 
ing discrimination values. 

Since, as a general rule, the point of 
dichotomization employed in the compu- 
tation of the discrimination values would 
probably be made so as to equalize the per- 
centage of responses in each response cate- 
gory for each item, this dichotomization 
point should serve as a satisfactory one for 
establishing marginal totals as well. In this 
study, therefore, the percentage of response 
on one side of the point of dichotomization 
used in computing discrimination values 
has been used as an index of cutting point. 

Since a suitable index of item reproduci- 
bility was not devised, the study has been 
limited to an investigation of the interrela- 
tionships among scale values, ambiguity 
values, discrimination values, and cutting- 
point indices. 


A. Procedure 


Indices of scale and ambiguity values 
used were the M, and S, values computed 
on the basis of sample 11-I1-100. The com- 
putation of these has been described in 
section III. 

Discrimination values and cutting-point 
indices were computed on the basis of re- 
sponses collected by the Likert technique 
from 304 students enrolled in sociology and 
psychology classes at the Atlanta Division 
of the University of Georgia. 

Items were ordered according to the 
Thurstone and Chave scale values; all 
items having scale values below 5.3 were 
judged to be favorable items, and all items 
above that point were judged to be un- 
favorable items. Scoring weights ranging 
from o for a response of strongly agree to 4 
for a response of strongly disagree were as- 
signed to favorable items; reversed weights 
were assigned unfavorable items. Each 
subject’s responses were scored with these 


SAM C. 


WEBB 


weights. The scores ranged from 394 to 75, 
with a median of 164. Students were di- 
vided into upper and lower half; and for 
each half the frequency of response in each 
response category was tabulated. 

Next, for the total group of subjects the 
distributions of responses in the five re- 
sponse categories for each item were con- 
sidered in order to determine where to 
place the point of dichotomization. Inspec- 
tion of the data suggested it should be be- 
tween 1 and 2 or between 2 and 3. For each 
item the percentage of subjects responding 
o and 1 and o, 1 and 2 was computed. 
These two sets of percentages correlated 
.g66; but since the percentages based on 
responses 0, 1, and 2 gave a more evenly 
spread distribution, the point of dichoto- 
mization was chosen to fall between 2 and 
3. The percentages of responses falling in 
the o, 1, and 2 categories were considered 
as the marginal totals or cutting-point 
indices. After dividing the total group into 
upper and lower halves according to score 
and at the point of dichotomization of item 
response, @ coefficients were computed by 
the use of Jurgensen’s (6) table. These were 
considered as discrimination values. 

Finally, selected correlations among the 
scale, ambiguity, cutting-point, and dis- 
crimination values were determined. 

Since most of the relationships were 
curvilinear, it was necessary to decide 
which variable should be considered inde- 
pendent and which dependent; and since 
discrimination and cutting-point values 
were the best available indices of item re- 
producibility and cutting points, respec- 
tively, unbiased correlation ratios have 
been computed for the regression of these 
values on the other indices. The results are 
shown in Table 12. For each relation the 
regression line used was that of the first- 
named value on the second. 

A good deal of the curvilinearity in the 
scattergrams for discrimination value vs. 


| 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS 


TABLE 12 
SELECTED UNBIASED CorRELATION Ratios BETWEEN SCALE, AMBIGUITY, 
CutTTING-PoInT, AND DISCRIMINATION VALUES 


Variables 


Shape of 
Regression 


Equation Equation 


Discrimination value vs. cutting point 
Discrimination value vs. scale value 
Discrimination value vs. ambiguity value 
Cutting point vs. scale value 

Cutting point vs. ambiguity value 
Ambiguity value vs. cutting point 
Ambiguity value vs. scale value 


| Curvilinear 
.192 458 Curvilinear 
.048 .210 Linear 


Curvilinear 
Curvilinear 
| Curvilinear 
Linear 
Linear 
Curvilinear 
| Curvilinear 


* Significant at 5% level of confidence. 
t Significant at 1% level of confidence. 


the other indices seemed to result from the 
fact that the discrimination values for 12 
items were negative. For these items the 
scoring weights were reversed so as to give 
positive correlations. The $’s were recom- 
puted, scattergrams were replotted, and 
¢ recomputed, The values so obtained are 
recorded under the label of “reflected 
values.” These should be considered in 
evaluating the relation of discrimination 
values to the other variables. 


B. Results 


It will be okserved that all of the re- 
ported relationships except three are sig- 
nificant at a 5% or higher level of confi- 
dence and that the regression lines for all 
but three depart significantly from linearity 
at a 1% level of confidence. 

Considering the “not reflected”’ data the 
scattergram shapes of discrimination value 
vs. scale value and ambiguity value, respec- 
tively, were V and 7 shaped respectively. 
These types of shapes seemed to result from 
the fact that all negative discrimination 
values were for items in the middle range of 
scale values and consequently for items of 
high ambiguity. The regression line of dis- 
crimination value vs. cutting points was 
U shaped. However, when the scoring of 
items with negative discrimination values 
was reflected so as to produce positive dis- 


crimination values, the regression of dis- - 


crimination value vs. ambiguity did not 


depart significantly trom linearity at a 5% 
level of confidence. The regression on scale 
value, however, remains roughly U shaped 
with lowest @’s falling in the scale value 
range of 4.0-4.9. 

The regression line for discrimination 
values vs. cutting-point index also has 
something of a U shape, though since the 
items in the range 0.0-0.9 for cutting 
points had the lowest average @, the curve 
actually has 2 flex points. Items of highest 
¢ values were in the range 1.0-5.0 and 
8.0-9.0. 

The regression lines of cutting point vs. 
scale value and ambiguity value do not de- 
part significantly from linearity, The rela- 
tion between cutting point on ambiguity 
value was essentially zero, but the relation 
with scale value was relatively high. The 
obtained Pearson product-moment 1 was 
— 888. These data suggest no relation of 
cutting point on ambiguity, but since, as 
the data show there is a significant curvi- 
linear regression of ambiguity value on 
cutting point, the « of —.138° for cutting 
point on ambiguity value does not provide 
a full account of the relationship between 
these two variables. Actually, the regres- 
sion line of ambiguity value on cutting 
point is U shaped with the U lying on its 
side with the open end on the left. This 

* The negative sign results from the fact that in 


the process of correcting ¢ for bias, the amount of 
correction was larger than the obtained e. 


17 
— 


18 SAM C. 
means that given a low ambiguity value 
one can predict a high or low cutting- 
point index, and given a high ambiguity 
one can predict a cutting point of middle 
range. 

As previously reported, the regression of 
ambiguity on scale value was inverted U 
shaped, and the correlation ratio was sig- 
nificant at a 1% level of confidence. 


C. Discussion and Summary 


The results of this study demonstrate the 
following relationships: (a) an insignificant 
regression of discrimination and cutting- 
point values on ambiguity values, (b) a 
significant but fairly low curvilinear corre- 
lation of discrimination values on cutting- 
point and on scale values, (c) a moderately 
high curvilinear correlation for ambiguity 
on scale values, and (d) a high linear corre- 
lation between cutting points and scale 
values, 

The low relation of ambiguity value to 
cutting-point and discrimination values 
together with the high relation of ambigu- 
ity to scale values suggests that knowledge 
of ambiguity values adds little over and 
above what is provided by scale value to 
the selection of a scalable set of items. 

Further, the high relation of scale values 
to cutting-point values suggests that the 
latter could be used as a satisfactory index 
of ordering of items and thus obviate the 
necessity of obtaining scale values at all. 

The fact that the discrimination values 
have a fairly low correlation with cutting- 
point and scale values suggests that this 
index measures something different from 
the other two and should be retained in the 
process of selecting items. 

On the basis of these statements, it 
seems possible to infer, as Edwards and 
Kilpatrick (4) have suggested, that items 
may satisfactorily be selected on the basis 
of data collected by the Likert technique 
alone. This would mean that the pro- 


WEBB 


cedures of the scale discrimination tech- 
nique recommended by Edwards and Kil- 
patrick could be reduced to the computa- 
tion of cutting-point and discrimination 
values from data collected by the Likert 
method, plotting discrimination vs. cut- 
ting-point values, and the selection of the 
desired number of items with highest dis- 
crimination values within each cutting- 
point index interval. 

An inspection of the scattergram of scale 
value vs. cutting-point value suggests the 
possible difficulty of determining which 
items, if only cutting-point value were 
known, would fall in the neutral range in 
the Thurstone-technique sense. However 
with a set of items edited so as to include 
one or two items containing such specific 
determiners as “neutral,” “indifferent to,” 
or “don’t care cither way” to locate the 
neutral point in the neutral range, this dif- 
ficulty should not arise. 

It should be pointed out that the findings 
reported in relation to the relatively small 
contribution of ambiguity value over and 
above that of scale values is in terms of 
values computed by the procedures of this 
study. It seems probable, however, in view 
of the results of Edwards and Kilpatrick, 
that essentially the same results would have 


been obtained had the computational pro- 
cedure of Thurstone been used. 


IX. SuMMARY 


This monograph reports the results of a 
series of studies concerning scale and ambi- 
guity values obtained by the method of 
equal-appearing intervals. These studies 
were primarily concerned with the follow- 
ing topics: (1) the effects of number of 
intervals, number of judges, and efficiency 
of the estimating formulae on (a) the re- 
liability of scale and ambiguity values, 
(b) the intercorrelations for scale and for 
ambiguity values, and (c) the curvilinear 
relations between scale and ambiguity 


SCALE AND AMBIGUITY VALUES OF EQUAL-APPEARING INTERVALS 19 


values; (2) a comparison of scale and am- 
biguity values computed by the method of 
this study with values obtained by the 
Thurstone and Chave method; and (3) the 
relevance of scale and ambiguity values for 
selecting items for Guttman-type scales. 
The 130 items of attitude toward the 
church first employed by Thurstone and 
Chave were used in the study. 

For the first-named topic the procedure 
involved the computation of scale and am- 
biguity values from various interval-judge- 
formula combinations, and analyzing the 
effects of these variables on the three 
aspects of scale and ambiguity values under 
consideration. Three different numbers of 
scale intervals (11, 7, and 5); three differ- 
ent numbers of judges (100, 50, and 25); 
four different formulae for computing scale 
values (the mean and three estimates of the 
mean based on percentile values); and four 
different formulae for computing ambi- 
guity values (the standard deviation and 
three estimates of the standard deviation 
based on percentiles) were employed. Per- 
centile values for use in the computing 
formulae were obtained from ogives by 
dropping perpendiculars from the specified 
points. This procedure differed from that of 
Thurstone and Chave in the treatment of 
extreme items since for these items these 
investigators extrapolated for Py and 
doubled the quartile deviations for esti- 
mating ambiguity values. 

For the second topic, the ordering of 
scale and ambiguity values obtained by the 
procedures of this study were compared 
with the ordering of items obtained by 
Thurstone and Chave. 

The procedure for the third topic in- 
volved an analysis of interrelationships of 
scale and ambiguity values computed by 
the method of equal-appearing intervals, 
and discrimination values and cutting- 
point indices obtained from data collected 
by the method of summated ratings. 


The major findings of these studies were 
as follows: 

For scale values, reliability increases 
with successively more efficient computing 
formulae. 

While there was a tendency for ambi- 
guity values computed from increasingly 
efficient computing formulae to be more re- 
liable, the reliability of the standard devia- 
tion was less than that of the most efficient 
percentile computing formula used. In two 
out of three cases it was less than that of the 
least efficient formula used. 

The results of a three-dimensional fac- 
torial design of Fisher’s z transformation of 
reliability coefficients involving three num- 
bers of intervals, three numbers of judges, 
and three percentile computing formulae, 
indicated that for scale values the judges- 
by-intervals interaction and the three main 
effects were significant sources of variance. 
Tests of significance among the various 
levels for the three main effects showed sig- 
nificantly increasing reliability with in- 
creasing numbers of intervals, with in- 
creasing numbers of judges, and with in- 
creasingly efficient formulae. However 
since all reliability coefficients of all judges, 
intervals, formulae were .g76 or higher, 
these differences are of little practical sig- 
nificance. 

The results from similar analyses of am- 
biguity values showed that all second-order 
interactions were significant at a .oo1 level 
of confidence. There was, in addition, evi- 
dence to suggest that the three main effects 
were significant sources of variance. While 
all differences among the various levels for 
the main effects were in the expected direc- 
tions, when tested against the appropriate 
first-order interaction as error, they were 
not all significant. 

Intercorrelations computed among scale 
values for selected judge-interval-formula 
combinations indicated that the order- 
ings of scale values computed on all the 


20 SAM C. 
various judge-interval-formula combina- 
tions were for practical purposes approxi- 
mately the same, since all intercorrelations 
were .98 or higher. 

Intercorrelations of ambiguity values for 
selected judge-interval-formula combina- 
tions showed considerable variability and 
suggested the advisability of determining 
how much departure from the ordering of 
items obtained from a preferred interval- 
judge-formula combination one will ac- 
cept before using values computed on the 
basis of some other combination. 

An inspectional analysis of the unbiased 
correlational ratios of ambiguity values on 
scale values, computed from data for se- 
lected interval-judge-formula combina- 


tions, suggested that the size of the relation 
for S; ambiguity values on M, scale values 
is a function of the number of sortings on 
which the values are based. For values 
computed on the basis of sortings of 100 
judges the size of ¢ does not appear to be a 


WEBB 


function of the computing formula; but the 
size of ¢ for S; values on M, scale values 
appears to vary directly with the number 
of intervals. 

The ordering of scale and ambiguity 
values obtained by the methods employed 
in this study differ significantly from the 
ordering of scale and ambiguity values ob- 
tained by Thurstone and Chave. However, 
it was not possible to tell conclusively 
whether the differences were a result of 
differences in procedure or differences in 
cultural setting within which the judg- 
ments were made, 

Because of the high relation of ambi- 
guity value on scale value, the high relation 
of cutting-point index on scale value, and 
low correlation of cutting-point on ambi- 
guity value, it appears that cutting-point 
indices can be substituted for scale values, 
and thereby eliminate the need for scale 
and ambiguity value in selecting iterns for 
Guttman-type scales, 


REFERENCES 


1. Eowarops, A. L. A critique of “neutral” items 
in attitude scales constructed by the method 
of equal-appearing intervals, Psychol. Rev., 
1946, 53, 159-169. 

a. Epwaros, A. L., & Kenney, K. C. A com- 
parison of the Thurstone and Likert tech- 
niques of attitude scale construction. 7. 
appl. Psychol., 1946, 30, 72-83. 

4. Epwaros, A. L., & Kivparrick, F. P. Scale 
analysis and the measurement of social atti- 
tudes. Psychometrika, 1948, 13, 14. 

Epwaros, A. L., & Kivparrick, F. P. A tech- 
nique for the construction of attitude scales. 
J. appl. Psychol., 1948, 32, 474-384. 

5. Farnswortn, P. R. Shifts in the value of 
opinion items. 7. Psychol. 1943, 16, 125-128. 

6. Jurcensen, C. E, Table for determining phi 
coefficients. Psychometrika, 1947, 12, 17-29. 

7. Ketiey, T. L. Fundamentals of statistics. Cam- 
bridge, Mass.: Harvard Univer. Press, 1947. 

8. Mosrerrer, F, On some useful inefficient sta- 
tistics, Ann. math. Statist., 1946, 17, 377-408. 

9. Tuursrone, L. L. Attitudes can be measured. 


Amer. J. Sociol., 1928, 33, 529-554. 

10. Tuurstrone, L, L., & Cuave, J. Measure- 
ment of attitude. Chicago: Univer. of Chicago 
Press, 1929. 

. Wess, S. C. A generalized scale for measuring 
interest in natural science subjects. Educ. 
psychol. Measmt, 1951, 11, 456-469. 

. Wess, S. C, Scaling of attitudes by the method 
of equal-appearing intervals: a review. 7. 
soc. Psychol., in press. 

3. Weep, 8. C. Irregularities in judgment data 
collected by the method of equal-appearing 
intervals. 7. abnorm. soc. Psychol., 1954, 49, 
415-418. 

. Wicker, M. P. A comparison of attitude scale 
values yielded by scales of differing lengths. 
Unpublished master’s thesis, Univer. of 
North Carolina, 1950. 

. Yost, E. K. Joint estimation of mean and 
standard deviation by percentiles. Unpub- 
lished master’s thesis, Univer. of Oregon, 
1948. 


(Accepted for publication September 2, 1954) 


